Benefit of Proper Language Processing for Czech Speech Retrieval in the CL-SR Task at CLEF

نویسنده

  • Luděk Müller
چکیده

The paper describes the system built by the team from the University of West Bohemia for participation in the CLEF 2006 CL-SR track. We have decided to concentrate only on the monolingual searching in the Czech test collection and investigate the effect of proper language processing on the retrieval performance. We have employed the Czech morphological analyser and tagger for that purposes. For the actual search system, we have used the classical tf.idf approach with blind relevance feedback as implemented in the Lemur toolkit. The results indicate that a suitable linguistic preprocessing is indeed crucial for the Czech IR performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Benefit of Proper Language Processing for Czech Speech Retrieval in the CL-SR Task at CLEF 2006

The paper describes the system built by the team from the University of West Bohemia for participation in the CLEF 2006 CL-SR track. We have decided to concentrate only on the monolingual searching in the Czech test collection and investigate the effect of proper language processing on the retrieval performance. We have employed the Czech morphological analyser and tagger for that purposes. For...

متن کامل

University of Chicago at the CLEF 2007 Cross Language Speech Retrieval Track

The University of Chicago participated in the CLEF 2007 CL-SR track, performing monolingual retrieval for both English and Czech and cross-language French-English retrieval. English experiments considered the impact of automatically generated keywords on retrieval. Czech experiments explored the effect of different stemming approaches on retrieval for this morphologically rich language. The bes...

متن کامل

Charles University at CLEF 2007 CL-SR Track

This paper describes a system built at Charles University in Prague for participation in the CLEF 2007 Cross-Language Speech Retrieval track. We focused only on mono-lingual searching the Czech collection and used the LEMUR toolkit as the retrieval system. We employed own morphological tagger and lemmatized the collection before indexing to deal with the rich morphology in Czech which significa...

متن کامل

Experiments for the Cross Language Speech Retrieval Task at CLEF 2006

This paper presents the second participation of the University of Ottawa group in the Cross-Language Speech Retrieval (CL-SR) task at CLEF 2006. We present the results of the submitted runs for the English collection and very briefly for the Czech collection, followed by many additional experiments. We have used two Information Retrieval systems in our experiments: SMART and Terrier, with sever...

متن کامل

Brown at CL-SR'07: Retrieving Conversational Speech in English and Czech

Brown’s entry to the Cross-Language Speech Retrieval (CL-SR) track at the 2007 Cross Language Evaluation Forum (CLEF) was based on the language model (LM) paradigm for retrieval [17]. For English, our system introduced two minor enhancements to the basic unigram: we extended Dirichlet smoothing (popular with unigram modeling) to bigrams, and we smoothed the collection LM to compensate for the s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015